- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources2
- Resource Type
-
0001000001000000
- More
- Availability
-
20
- Author / Contributor
- Filter by Author / Creator
-
-
Livesay, Michael (2)
-
Hajek, Bruce (1)
-
Lubars, Joseph (1)
-
Srikant, R (1)
-
Winnicki, Anna (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
& Akcil-Okan, O. (0)
-
& Akuom, D. (0)
-
& Aleven, V. (0)
-
& Andrews-Larson, C. (0)
-
& Archibald, J. (0)
-
& Arnett, N. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
In many applications of reinforcement learning, the underlying system dynamics are known, but computing the optimal policy is still difficult because the size of the state space can be enormously large. For example, Shannon famously estimated the number of states in chess to be approximately 10^120. To handle such enormous state space sizes, policy iteration algorithms make two major approximations; it is assumed that the value function lies in a lower-dimensional space and that only a few steps of a trajectory (called rollout) are used to evaluate a policy. Using a counterexample, we show that approximations can lead to the divergence of policy iteration. We show that sufficient lookahead in the policy improvement step mitigates this divergence and leads to algorithms with bounded errors. We also show that these errors can be controlled by appropriately choosing an appropriate amount of lookahead and rollout.more » « less
-
Hajek, Bruce; Livesay, Michael (, 2019 58th Conference on Decision and Control)The theory of mean field games is a tool to understand noncooperative dynamic stochastic games with a large number of players. Much of the theory has evolved under conditions ensuring uniqueness of the mean field game Nash equilibrium. However, in some situations, typically involving symmetry breaking, non-uniqueness of solutions is an essential feature. To investigate the nature of non-unique solutions, this paper focuses on the technically simple setting where players have one of two states, with continuous time dynamics, and the game is symmetric in the players, and players are restricted to using Markov strategies. All the mean field game Nash equilibria are identified for a symmetric follow the crowd game. Such equilibria correspond to symmetric $$\epsilon$$-Nash Markov equilibria for $$N$$ players with $$\epsilon$$ converging to zero as $$N$$ goes to infinity. In contrast to the mean field game, there is a unique Nash equilibrium for finite $N.$ It is shown that fluid limits arising from the Nash equilibria for finite $$N$$ as $$N$$ goes to infinity are mean field game Nash equilibria, and evidence is given supporting the conjecture that such limits, among all mean field game Nash equilibria, are the ones that are stable fixed points of the mean field best response mapping.more » « less
An official website of the United States government
